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Abstract 

To take advantage of the full potential of ubiquitous 
computing, we will need systems which minimize power- 
consumption. Weiser et al. and others have suggested 
that this may be accomplished by a CPU which dynam- 
ically changes speed and voltage, thereby saving energy 
by spreading run cycles into idle time. Here we continue 
this research, using a simulation to compare a number of 
policies for dynamic speed-setting. Our work clarifies a 
fundamental power vs. delay tradeoff, as well as the role 
of prediction and of smoothing in dynamic speed-setting 
policies. We conclude that success seemingly depends 
more on simple smoothing algorithms than on sophisti- 
cated prediction techniques, but defer to the replication 
of these results on future variable-speed systems. 



1 Introduction 

Recent developments in ubiquitous computing make 
it likely that the future will see a proliferation of cord- 
less computing devices. Clearly it will be advantageous 
for such devices to minimize power-consumption. The 
top power-consumers in a computer system are the dis- 
play (68%), the disk (20%), and the CPU (12%) [6]. 
There is seemingly little which can be done to min- 
imize screen power-consumption, beyond employing a 
screen-saver and waiting for hardware improvements. 
Disk power-consumption may be minimized by spinning 
down the disk when it has been inactive for several sec- 
onds; [3, 6, 7) have researched this topic. 
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In the future we may well see ubiquitous comput- 
ing devices with neither disks nor conventional dis- 
plays; and, for such devices, minimizing the power- 
consumption of the CPU will be particularly critical. 
Methods for saving CPU power have been suggested by 
[2, 4, 9]. They point out that some CPUs can run at a 
range of possible speeds: and voltage may then be de- 
creased as speed decreases. From [9] we inherit the as- 
sumption that, over an idealized device's voltage range 
[V^iin, VtoaxIi voltage may be decreased in direct propor- 
tion to speed: this is a valid first-order approximation 
[1]. Now, a CPU, regarded as a capacitor-based system, 
satisfies the physical law 

energy/ sec oc voltage 2 • speed 

or equivalently 

energy /task oc voltage 2 . 

And so it is possible to save on overall energy-usage 
by reducing voltage. In particular, if voltage may be 
reduced in direct proportion to speed, then 

energy /task oc speed 2 . 

Hence it would be advantageous to have a CPU ca- 
pable of dynamic speed-setting: Such a CPU could well 
decrease power- usage without inconvenience to the user. 
For example, a CPU might normally respond to a user's 
command by running at full speed for 0.001 seconds, 
then waiting idle; running at one- tenth speed, the CPU 
could complete the same task in 0.01 seconds, thereby 
saving energy without generating noticeable delay. 

Essential performance factors of a dynamic speed- 
setting policy are power-savings and delay. To save 
power, a CPU would ideally run at a fiat, average speed. 
But this would result in unacceptable delay; hence a 
tradeoff between the two factors must be accomplished. 
The question of how to measure delay is found to be 
non-trivial, as is the question of how much delay is ac- 
ceptable. Ideally, we would know the maximum allow- 
able delay for each process; but in existing systems such 
information is not generally available. 
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In seeking to strike an optimal balance between low 
power-consumption and low delay, an algorithm must 
consider issues of prediction and smoothing. Due 
to pragmatic limits on the frequency with which CPU 
speed can be changed, a speed-setting policy must first 
predict how busy the CPU will be in the near future. 
Then, given this prediction, the policy must make a 
decision aimed at smoothing speed. For example, if a 
peak in CPU usage is predicted, the policy might in- 
crease speed, but it might also keep speed low, thereby 
evening out speed at the cost of increasing delay. 

The distinction between prediction and smoothing is 
somewhat subjective. For instance, a speed-setting al- 
gorithm which strongly attempts to set a flat, average 
speed may be thought of in terms of prediction (it al- 
ways predicts that the future will be like the average) 
or in terms of smoothing (it smoothes to the greatest 
extent possible). Nevertheless, we will attempt to sepa- 
rate the two functions to some extent, trying to under- 
stand the utility (or lack thereof) of several algorithms 
for prediction and for smoothing. 

Weiser et ai [9] present only one practical speed- 
setting policy, PAST. PAST's prediction algorithm is 
elementary, and its smoothing somewhat ad hoc. Hy- 
pothesizing that more sophisticated prediction methods 
will allow for substantially improved performance, we 
here set out to test several new policies. 

In Section 2, we review the simulation model em- 
ployed by Weiser et ai In Section 3, we indicate how 
we have altered this model. In Section 4, we present a 
number of speed-setting policies. In Section 5, we ana- 
lyze the performance of these policies. Finally, in Sec- 
tion 6, we present our conclusions and suggest avenues 
for future research. 

2 Previous work 

From Weiser et ai [9] we inherit a simulator, and 
with it a number of assumptions. 

Only the CPU's energy-usage is studied; there is no 
consideration of the energy costs which CPU-slowing 
could generate elsewhere in the system. Indeed, due to 
the radically different time-frames involved, it is rea- 
sonable to focus on the CPU alone. For example, CPU- 
slowing could stretch a 0.001 second task to 0.01 sec- 
onds, whereas disk spindown might typically take place 
after 2-300 seconds of disk idle [6]; hence we may well 
expect that the former will have no substantial effect on 
the latter. In order to support this argument, our poli- 
cies will explicitly limit delay to a brief interval-length. 

The simulator takes as input traces of CPU-usage 
for a workstation running standard applications. No 
attempt is made to capture the unique workload (if any) 
of a ubiquitous computing device. 



Trace data is first divided into uniform-length time 
intervals. For each interval, one computes the run.- 
percent: the fraction (on range [0, 1]) of cycles in which 
the CPU is active. Figures 1 and 2 give examples of 
such data for interval Jengths 0.01 seconds and 0.05 
second?. Not surprisingly, the run.percent values are 
more bursty for the smaller interval Jength. 

It is assumed that, for a given interval, speed may be 
set to any real number on range [minjspeed, 1], where 
1 represents full speed. Weiser et ai compile data for 
min_speeds 0.2, 0.44, and 0.66 (corresponding to ideal- 
ized CPUs with full voltage 5.0 V and minimum voltages 
1.0 V, 2.2 V, and 3.3 V). It is also assumed that speed 
changes have no time or energy cost. 

Moreover, it is assumed that no energy is expended 
during intervals in which no work is done; i.e., the ability 
to dynamically switch the CPU on/off is implicit. Such 
a capability is indeed technically feasible: for example, 
the Intel Pentium Processor can, in < 50 /isec, enter a 
low-current sleep state in which power-usage drops 85% 
[5, Section 5.3]. In the energy-usage graphs of Weiser et 
al, and in our own, the maximum energy level (energy 
usage = 1) already reflects savings due to a zero-power 
sleep state. The extent of this initial saving is not here 
studied, but may well be substantial. 

Each Weiser et ai simulation runs a policy on trace 
data ten times, using interval Jengths 0.001, 0.005, 0 01. 
0.02, 0.03, 0.05, 0.1, 0.25, 0.5, and 1.0 seconds. When, 
during each run, speed is not fast enough to complete 
an interval's work, excess-cycles spill over into later 
intervals. To summarize the simulation results, energy- 
usage is plotted as a function of intervalJength. The 
simulator also computes delay-penalty, a somewhat 
subjective measure of total delay determined by refer- 
ence to all values of excessxycles. 

If an optional flag is set, the simulator attempts to di- 
vide idle time into soft Jdle, into which run -cycles may 
allowably be stretched, and hard Jdle, which must be 
left intact. For example, it is valid to stretch run-cycles 
into time spent waiting for the user's next command 
(soft Jdle), but not valid to stretch a process into time 
during which it must wait after requesting data from 
disk (hardJdle). 

The Weiser et ai. policies may change CPU speed 
only at the start of those intervals containing a process- 
start or process-stop event. Thus speed may not be 
recomputed for intervals in the midst of long runs or 
long idles. 

The simulator cannot model event reordering due to 
speed changes, and cannot identify situations in which 
delays could be invidiously additive. This seems an In- 
herent difficulty of simulating on limited trace data. 
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Figure 2: Run.percent values. Trace emacsl, intervaUength 0.05 seconds. 
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3 Simulation model 

We employ the same traces used by Weiser et al Our 
simulator is based on theirs, but has been changed in 
several respects. 

• All of our simulations are for minjspeed = 0.2. 

• To speed up the simulation, we did not employ the 
0.001 second intervalJength. 

• An error was corrected: due to a programming bug, 
the Weiser et al simulator was overly optimistic about 
the amount of work which could be completed in certain 
intervals. 1 

• The Weiser et al option of dividing idle time into 
hard idle and softJdle seemed degenerate, often failing 
to identify any significant amount of softJdle. Most of 
our simulations are therefore run without the hard/soft 
option. However, we later confirm that runs with the 
hard/soft option yield similar results. 

• We recompute speed at the beginning of each inter- 
val, even if the interval is in the midst of a long run or 
idle. This policy could be regarded as less efficient than 
that employed by Weiser et al., as its implementation 
would require additional interrupts. On the other hand, 
we feel the Weiser et al model to be unrealistic: for it 
gives its speed-setting policy premature knowledge that 
a long run has begun. Moreover, we wished to create 
and analyze policies in a way more suited to a uniform, 
per-interval speed-setting. 2 

• Rather than plotting our results as power vs. in- 
tervalJength, we plot power vs. a delay measure. Thus 
we attempt to focus more clearly on the power vs. delay 
tradeoff, regarding intervalJength as a merely internal 
parameter. 

In theory the two plotting methods are related, as the 
Weiser et al PAST policy attempts to limit delay to an 
intervalJength. However, as excess-cycles are allowed 
to spill over into future cycles, true delay is actually an 
unclear function of intervalJength and delay-penalty. 
The importance of delay-penalty is clearly indicated by 
the fact that Weiser et c/.'s FUTURE, an artificial pol- 
icy which has perfect knowledge of the coming interval, 
nevertheless uses more energy than PAST because it is 
not allowed to transfer excess-cycles between intervals. 

Instead of trying to combine intervalJength and de- 

1 This bug may be observed in the "Speed Setting Algorithm" 
of [9, Section 6], In the second assignment statement, ex- 
cessjcydes from the previous interval are added into run -cycles. In 
the fourth assignment statement, the time available to do work 
in the current interval is then given as (runjcycles + softJdle); 
and the adjusted rather than raw value of runjcycles is mistak- 
enly used here. Hence results may be substantially over-optimistic 
when excess.cycles is high. 

2 We also use this revised simulation method when running 
Weiser et o/.'s policy PAST for purposes of comparison. We intend 
this as a measure friendly to PAST, as its performance improves 
due to this modification (as measured by the revised delay metric 
described below). 



lay-penalty into a meaningful composite number, we 
substituted our own measure of delay, which is illus- 
trated in Figure 3. For a given CPU task, imagine plot- 
ting the amount of work that remains to be done as a 
function of time. The lower line in Figure 3 corresponds 
to the CPU running at full speed; the upper line illus- 
trates the same task being run slowly and intermittently 
on a variable-speed CPU. We take the area between the 
two lines as a measure of the task's delay; we then divide 
the sum of all the "delay areas" in the trace by the sum 
of all the "full speed areas" to derive a figure for average 
delay. This measure, while still arbitrary, has some so- 
phistication: it fairly represents delay both within and 
between intervals, it is not unduly sensitive to arbitrary 
time-slicing of tasks into smaller tasks, and it considers 
a task which has nearly been completed before a de- 
lay (and so may already have generated child processes 
or critical portions of output) as having a lower delay 
figure that a task delayed at its start. 

4 Speed-setting policies 

4.1 PAST 

PAST is the only practical speed-setting poKcy pro- 
posed by Weiser et al We employ it for purposes of 
comparison. 3 . 

• PAST: 

- Prediction: PAST calculates how busy the 
last completed interval was (including ex- 
cess jcycles brought into that interval). It 
then predicts that the coming interval will be 
equally busy. 

- Speed-setting: If the prediction is for a 
busy interval, PAST increases speed; if for a 
mostly idle interval, PAST decreases speed. 
Some smoothing is accomplished by limiting 
the amount by which speed can change (ex- 
cept that speed may be increased to 1 if ex- 
cess^cycles rises particularly high). 4 

Since PAST attempts to complete work within the 
interval after that which generated it, it is not sur- 
prising that delay rises and power-usage falls as inter- 
valJength increases. Weiser et al identify the range of 
interval Jengths from 0.01 seconds to 0.05 seconds as one 
in which delay and energy-savings both seem accept- 
able. For a simple algorithm, PAST does surprisingly 
well. 

3 It should be noted, however, that PAST is evidently only 
intended as a reasonable first-version policy. Moreover, our cor- 
rection of a simulator bug, as noted in Section 3, has affected the 
performance of the policy in ways to which Weiser et oi. did not 
have the opportunity to respond. 

♦Refer to [9] for full details. 
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Figure 3: Graph of run-cycles not yet completed as a function of time, for a hypothetical process that normally takes 
10 seconds to execute. The bottom line is for process execution at full speed; the top line, for process execution at 
a slower, intermittent speed. 



Nevertheless, we felt that there was room for im- 
provement here. We disagree with PAST's prediction 
algorithm: in a bursty trace such as that illustrated in 
Figure 1, the assumption that adjacent intervals will 
be similar is in fact almost certainly wrong. Moreover, 
PAST considers only the excessjcycles that went into 
the last completed interval, ignoring the seemingly more - 
valuable figure of excess-cycles being pushed out from 
that interval into the coming interval. Since it looks 
back only one interval, PAST smoothes speed poorly; 
and the attempt to patch this problem with an arbitrary 
limit on speed-change seems ad hoc — and dangerous, in 
that a process can be delayed several intervals while the 
system slowly banks up speed. The behavior of PAST 
can be strange: given uniform input data, it can thrash 
speed without coming to a limit; and, due to mistaken 
speed-setting decisions, it saves little more energy when 
min-speed is 0.2 than when it is 0.44. Finally, we feel 
that the role of intervalJength is confused, as it influ- 
ences the outcome of the simulation in three different 
ways, determining (1) the frequency with which speed 
can be changed, (2) the acceptable amount of delay, and 
(3) how far back PAST looks when making its predic- 
tions. We would wish to see a clearer separation of these 
distinct functionalities. 

In short, we felt that it should be possible to cre- 
ate stronger policies with more sophisticated prediction 
heuristics. 

4.2 FLAT 

Our first policy is FLAT. Weak on prediction, it sim- 
ply tries to smooth speed to a global average. FLAT 
takes an input parameter ( const), which must be a real 



number on range [0, 1]. 
• FLAT (const): 

- Prediction: Predict the new run-percent to 

be (const). 

- Speed-setting: Set speed fast enough to 
complete the predicted new work plus the ex- 
cess.cycles being pushed into the coming in- 
terval (subject, of course, to the limit of full 
speed = 1). 

While FLAT wants to keep run.percents as flat as 
possible, it also responds effectively to excess_cycles. In- 
deed, speed is always set fast enough to complete at 
least the excessjcycles, so no work may be delayed more 
than one interval. The same speed-setting rule will be 
employed in most of our policies. 



4.3 LONG -SHORT 

LONG -SHORT is a more predictive policy, one which 
attempts to find a golden mean between local behavior 
and a more long-term average. Parameter (const) is a 
non-negative real number which, as it is increased, gives 
more weight to local behavior. We hoped to discover an 
optimal value of (const) at which LONGJSHORT would 
predict accurately by giving the best possible weight 
to local behavior. This may alternatively be thought 
of in terms of smoothing: LONG-SHORT attempts to 
smooth to a global average, but shows some respect for 
local peaks. 



17 



BEST AVAILABLE COPY 



• LONG^HORT (const): 

- Prediction: Look up the last 12 run_percents 
(and, for this policy, we include in each 
run-percent the excess-cycles pushed into the 
interval). The 3 most recent run.percents con- 
stitute the short-term past; the remaining 9, 
the long-term past. Our prediction for the 
coming run.percent is then a weighted sum of 
these 12 values, where each short-term value 
is given weight (const), each long-term value 
weight 1. 

- Speed-setting: Set speed fast enough to 
complete the predicted work. 

For example, if (const) = 4 and the last 12 run.- 
percents, including excess-cycles, are 0 — > .3 .5 — ► 1 

1 1 -> .8 ^+ .5 -* .3 -> .1 -+ 0 0, then we would 
set speed to 

0+.3-h>5+l«H-H-l-.8+.5+.3+4(.l+0^0) _ q 276. 

Note that LONG-SHORT, an early policy, is less el- 
egant than FLAT and more like PAST, particularly in 
that the speed it sets is affected by excess-cycles only 
indirectly. 

4.4 AGED-AVERAGES 

A cleaner variant of LONG-SHORT, AGED.AVER- 
AGES employs an exponential-smoothing method, at- 
tempting to predict via a weighted average: one which 
geometrically reduces the weight given to each previous 
interval as we go back in time. Parameter (const), a real 
number on range (0, 1], determines the rate of geometric 
reduction. 

• AGED-AVERAGES (const): 

- Prediction: The predicted new run-percent 
is a weighted average of all previous run.- 
percents, where the weight given to an inter- 
val's data is multiplied by a factor (const) for 
each 0.01 seconds that we go back in time. 

- Speed-setting: Set speed fast enough to 
complete the predicted new work plus ex- 
cess-cycles. 

For example, if intervalJength is 0.01 seconds, 
(const) is |, and the previous run.percents are P u Pt-\ , 
Pi_2vi tnen tne predicted new run.percent would be 

Note that (const) is defined in such a manner that ag- 
ing will be essentially independent of intervalJength; 

1 



this we regard as a step toward reducing the confused 
multiple effects of intervalJength. 

We hoped that AGEDjWERAGES, like LONG.- 
SHORT, would work best for a particular value of 
(const) at which it would optimally balance the long- 
term and the short-term past. 

4.5 CYCLE 

We now experiment with more sophisticated predic- 
tion algorithms. The CYCLE policy was inspired by 
run.percent plots such as Figure 2. Observe that these 
run-percent values look quite cyclical. Can we take ad- 
vantage of such cycling to predict? 

• CYCLE (const): 

— Prediction: Examine the last 16 run_per- 
cents. Does there exist X € {1, 2, . . . ,8} such 
that the last 2X values approximately repeat a 
cycle of length XI If so, predict by extending 
this cycle. Or, if no good cycle is found, pre- 
dict the new run .percent to be a flat (const). 

- Speed-setting: Set speed fast enough to 
complete the predicted new work plus ex- 
cess-cycles. 

To be more exact, we evaluate potential cyclss by 
computing an error-measure equal to the mean aver- 
age of the differences between all pairs of run.percents 
which should match. For example, say that the last 
eight run.percents are 0 — » .4 — ► .8 — ► .1 — * .3 — ► .5 

.7 — > 0. Then an alleged cycle of length 4 would 
correspond to the following matchup: 

0 .4 .8 .1 
.3 .5 .7 0 

and our error-measure for this matchup would be 

|0- .31 + 1.4 -.51 + 1-8-. ?l + 1-1-01 = 0 15 
4 

We then predict according to the cycle with the lowest 
error-measure (if our example length-4 cycle was found 
to be best, our predicted new run-percent would be .3); 
or, if every potential cycle has error-measure > 0.2, we 
predict run_percent to be (const). 

Observe that CYCLE behaves like FLAT except 
when it prefers to make a "smarter" guess by reference 
to a discovered cycle. 

4.6 PATTERN 

CYCLE is generalized in PATTERN, which employs 
a method reminiscent of branch prediction tables. Here 
we attempt to identify the most recent run.percent val- 
ues as repeating a pattern seen earlier in the trace. 
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Let an interval of type A be one in which run_percent 
is between 0 and 0.25; type £, between 0.25 and 0.5; 
type C y between 0.5 and 0.75; type D, between 0.75 and 
1. Then if, for example, the most recent run.percents 
axe 0 -» .13 -» .28 -» .33 .52 .79, this would 
correspond to pattern AABBCD. If, looking back in 
time, we find that the last occurrence of AABBCD was 
followed by a fall back to A, we might guess that the 
same is about to happen now (and so would predict 
run-percent = 0.125 — the middte of the A range). 

Parameter (const) is a positive integer determining 
the pattern-length which PATTERN will employ. 

• PATTERN (const): 

- Prediction: Convert the (const) most recent 
run -percent s into a pattern in alphabet {A, B, 
C> D}. Then continue backward in time until 
you find find another occurrence of this pat- 
tern. Predict that the coming run.percent will 
have the same magnitude-letter as that which 
followed the previous instance of the pattern. 
Predict run-percent to fall in the middle of 
that letter's associated range (A =^ 0.125, 
B 0.375, C 0.625, D => 0.875). 

- Speed-setting: Set speed fast enough to 
complete the predicted new work plus ex- 

" cess-cycles. " 

This model of pattern-discovery is evidently par- 
tial and arbitrary. Nevertheless, we hoped that, when 
(const) was set optimally, repeated patterns — e.g., 
peaks of a certain common width — would be picked out 
to good effect. 

4.7 PEAK 

PEAK is a more specialized version of PATTERN. 
It uses heuristics based on the expectation of narrow 
peaks, such as those which occur frequently in Figure 
1. Hence we expect rising run-percents to fall symmetri- 
cally back down, falling run.percents to continue falling, 
a sustained run.percent = 1 to fall but a sustained 0 to 
hold steady. 

• PEAK (const): 

- Prediction: Let Pt-i, Pt be the two most 
recent run.percents, and let P< + i be our pre- 
diction for the coming run.percent. 

* If P £ >P ( _i, then P f+ i := max{P<-!,0.1}. 

* If P t < Pt-i, then P ( +i := min{P t , 0.1}. 

* If P t = P|_i, then, if P< = 1, P t+l := 0.4; 
otherwise, P<+i := P<. 



- Speed-setting: Set speed fast enough to 
complete the expected new work plus a (const) 
fraction of excessjcycles. 

Observe that we have modified our usual speed- 
setting policy of always trying to complete the expected 
new work plus all excess.cycles; this policy was logical 
but perhaps too cautious. Parameter (const) may now 
be varied from 1, indicating that we employ our usual 
speed-setting policy, to 0, indicating that we ignore the 
excess-cycles entirely. 



5 Performance of our policies 

In Section 5.1, we test each of our policies in turn, 
comparing them to PAST and finding the optimal value 
for each policy's (const) parameter. Runs are on trace 
emacsl, a relatively short trace of typing into an emacs 
buffer. In Section 5.2, we then evaluate the relative 
merits of our policies, and double-check with runs on a 
substantially different trace. 

5.1 Runs of each policy 

Figure 4 graphs the performance of FLAT running 
with several possible values of its (const) parameter; 
PAST is also provided for comparison: Observe that, for 
each policy, results are presented for a selection of nine 
intervalJengths; these generally form a curve, slant- 
ing toward more delay and less energy-usage as inter- 
valJength increases. The optimal algorithm that works 
within a given delay limit is found by starting on the 
x-axis at the desired delay figure and moving vertically 
until one reaches the lowest curve. Thus policies whose 
results curve closer to the origin are superior, while sets 
of data points which seem "shifted" along a single curve 
represent different ranges of possible energy-usage and 
delay but a similar energy vs. delay tradeoff. 

We observe from Figure 4 that FLAT achieves opti- 
mality when (const) « 0.4. We also note that FLAT 
is superior to PAST for all possible values of (const). 5 
Indeed, we will find this to be essentially true for all of 
our policies. 

Figure 5 graphs the performance of LONG-SHORT. 
In comparison to FLAT, this policy is shifted toward 
lower energy-usage and higher delay, possibly because 
its speed-setting is less responsive to excess-cycles. Re- 
sults improve as (const) increases, leveling out around 
(const) = 3 to 5. This seemingly indicates that predic- 
tion based on the short-term past is particularly advan- 
tageous. 



9 This is not an artifact of our new delay measure. If one uses 
Weiser tt a/.'s delay -penalty instead, the difference is only more 
pronounced. 
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Figure 4: Performance of policy FLAT on trace emacel. FLAT, run with several possible values of its parameter 
(const), is compared to PAST. For each policy, interval Jengths of 0.005, 0.01, 0.02, 0.03, 0.05, 0.1, 0-25, 0.5, and 1.0 
seconds are displayed connected. 
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Figure 5: Performance of policy LONG-SHORT on trace emacsl. LONG-SHORT, run with several possible values 
of its parameter (const), is compared to PAST. For each policy, interval-lengths of 0.005, 0.01, 0.02, 0.03, 0.05, 0.1, 
0.25, 0.5, and 1.0 seconds are displayed connected. 
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Figure 6 graphs the performance of AGED -AVER- 
AGES. We consider this result a disappointment be- 
cause, rather than indicating an optimal value for 
(const) , performance merely improves as {const) in- 
creases. Thus AGED-AVERAGES works best when 
{const) = 1, in which case it simply predicts by cal- 
culating the unweighted average of all run.percents so 
far. Thus, it seems that the best one can do here is 
to predict via a global average, in which case AGED_- 
AVERAGES is little different from FLAT. 

Figure 7 graphs the performance of CYCLE. Results 
seem lackluster; (const) « 0.5 is optimal. 

Figure 8 graphs the performance of PATTERN. The 
results here are particularly disappointing: there is 
no significant change as (const) varies, suggesting the 
meaningful patterns are not being found. 

Figure 9 graphs the performance of PEAK. Results 
seem strong, with PEAK achieving optimality when 
(const) « 0.2. Note, however, that PEAK performs 
little better when (const) = 0.2 than when (const) = 
1: there is — not surprisingly — a shift toward greater 
delay and less energy-usage, but the energy vs. delay 
tradeoff is not improved. This indicates that our ex- 
periment of being lazier about the completion of ex- 
cess .cycles has had little effect. Thus, if PEAK proves 
to be a strong policy, we should attribute this to the 
prediction algorithm rather than to our experiment in 
speed-smoothing. 

5.2 Comparing the policies 

Figure 10 summarizes the performance of the best 
policies from Figures 4-9. Surprisingly, the simplest 
policy, FLAT 0.4, is optimal for delay values < 8, while 
LONG -SHORT 3, which is scarcely more complex, is 
optimal for the higher delay values. Of our more so- 
phisticated predicting algorithms, PEAK 0.2 does best, 
coming close to equaling FLAT and LONG-SHORT 
in the medium-delay range. AGED .AVERAGES, CY- 
CLE, and PATTERN are all disappointing. It is par- 
ticularly telling that CYCLE is consistently worse than 
FLAT. For CYCLE imitates FLAT except when it is 
"trying to be clever"; and so this result would suggest 
that when CYCLE tries to be clever, the result is gen- 
erally for the worse. 

To assure that the above results are not specific to 
trace emacsl, we have duplicated the runs in Figure 
10 on a quite different trace, kestrel. marl; nearly 
ten hours long, this trace is on a workload including 
"software development, documentation, e-mail, simula- 
tion, and other typical activities of engineering work- 
stations" [9]. The results are graphed in Figure 11. 
Since the trace-data identified much of the idle time 
in kestrel. marl as soft, we were able to do these runs 
using the simulator option of stretching run-cycles only 



into soft idle. Also note that delay factors are unusu- 
ally small for this trace, apparently because a long block 
of full-run intervals, which our policies handle near- 
optimally, dominates the delay measure. 

In spite of these substantial differences, compara- 
tive results for the various policies are similar to those 
on emacsl. The main difference is that PEAK 0.2 
has edged ahead of FLAT 0.4 and LONG-SHORT 3 
to become the optimal algorithm in the medium-delay 
range. It is notable that PEAK, having been designed 
to work well with thin peaks, proves) particularly ef- 
fective for small interval Jengths, at which such bursty 
peaks are common. Figure 12 illustrates the supe- 
rior behavior of PEAK 0.2 relative to PAST by track- 
ing the speeds they respectively set on a stretch of 
kestrel. marl. These speeds are graphed along with 
the effective run-percent — that is, run-cycles divided by 
(run-cycles + soft idle). Note the comparative smooth- 
ness and the greater correspondence to run.percent of 
the PEAK speeds. 

6 Conclusions and directions for 
future research 

We found that several of our predictive algorithms 
performed poorly; only PEAK exhibited strong perfor- 
mance. We might then conclude that simple algorithms 
based on rational smoothing rather than "smart" pre- 
dicting may be most effective. 

Nevertheless, further possibilities for prediction re- 
main to be tried. A policy might sort past informa- 
tion by process-type; it could then use its knowledge of 
the expected run-times of various types of processes to 
better predict the system's near-future computational 
needs. Moreover, each application could provide the 
system with useful information — stating how much it 
expects to load the system in the near future and how 
long a delay of a given process it would regard as accept- 
able. Indeed, communication of straightforward dead- 
lines to the system (a keystroke must be processed in 
0.01 seconds, and so on) would be an obvious compo- 
nent of an optimal speed-setting policy. 

Testing out such theories, however, would quickly go 
beyond the limits of a simulation. How soon, then, may 
we employ actual variable-speed systems? 

CPUs already exist which are certified to operate over 
a range of possible voltages. The ARM Processor may 
operate at 2.5-3.6 V; while "a Motorola CMOS 6805 
microcontroller (cloned by SGS-Thomson) is rated at 6 
Mhz at 5.0 Volts, 4.5 Mhz at 3.3 Volts, and 3 Mhz at 2.2 
Volts. This is a close to linear relationship between volt- 
age and clock rate" [9]. Thus there is seemingly no tech- 
nical objection to designing a variable-voltage system, 
provided that the input reference voltage to the proces- 
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Figure 6: Performance of policy AGED-AVERAGES on trace emacBl. AGED -AVERAGES, run with several possible 
values of its parameter {const) is compared to PAST. For each policy, intervalJengths of 0.005, 0.01, 0.02, 0.03, 
0.05, 0.1, 0.25, 0.5, and 1.0 seconds are displayed connected. 
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Figure 7: Performance of policy CYCLE on trace emacsl. CYCLE, run with several possible values of its parameter 
(const), is compared to PAST. For each policy, intervalJengths of 0.005, 0.01, 0.02, 0.03, 0.05, 0.1, 0.25, 0.5, and 1.0 
seconds are displayed connected. 
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Figure 8: Performance of policy PATTERN on trace emacsl. PATTERN, run with several possible values of its 
parameter (const), is compared to PAST. For each policy, intervaJJengths of 0.005, 0.01, 0.02, 0.03, 0.05, 0.1, 0.25, 
0.5, and 1.0 seconds are displayed connected.. 



1 




- I I I 


.... r , , 


1 


1 


0.8 










past -0— 
peakO H — 
peak.1 

peak .2 _ 

peaKI -a- 


ergy usage 
o 




***** 

\ \ 

*■* \ 








§0.4 












0.2 
0 




i i r 


1 1 1., . 


1 





0.5 1 2 4 8 16 32 64 128 256 

delay factor (logscale) 

Figure 9: Performance of policy PEAK on trace emacsl. PEAK, run with several possible values of its parameter 
{const), is compared to PAST. For each policy, interval Jengths of 0.005, 0.01, 0.02, 0.03, 0.05, 0.1, 0.25, 0.5, and 1.0 
seconds are displayed connected. 
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Figure 10: Performance of various policies on trace emacsl. The best policies from Figures 4-9 are here compared to 
PAST. For each policy, intervalJeogths of 0.005, 0.01, 0.02, 0.03, 0.05, 0.1, 0.25, 0.5, and 1.0 seconds are displayed 
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Figure 11: Performance of various policies on trace kestrel. marl. The best policies from Figures 4-9 are here 
compared to PAST. For each policy, intervalJengths of 0.005, 0.01, 0.02, 0.03, 0.05, 0.1, 0.25, 0.5, and 1.0 seconds 
are displayed connected. 
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Figure 12: A stretch of the kestrel, marl trace, with effective run .percent (= run-cycles / (run-cycles + softidle)) 
graphed alongside the resulting speeds set by PEAK 0.2 and by PAST. Interval Jength is 0.005 seconds. 



sor's voltage regulator may be a digital word writable 
by the processor. 

The main time-cost would be for the converter to 
ramp the supply voltage and for the processor's phase- 
locked loop, if present, to change clock frequency. Thus 
ramping-time is determined by the time-constants of the 
converter and the phase-locked loop, and so would be on 
the order of tens of jisec [9, 5]. This time-scale seems 
well suited to our policies, which allow > 5000 /isec 
between speed changes. Moreover, the CPU should be 
able to continue working during a voltage ramp; and 
ramping should not have any substantial power-cost. 

Our results defer, then, to hopefully imminent studies 
of actual systems. 

We thank Marvin Theimer, Mark Weiser, Alan De- 
mers, Scott Shenker, Thomas Burd, and particularly 
Brent Welch, without whose help this project would not 
have been possible. 
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